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Abstract 

We study the problem of identifying the top m arms in a multi-armed bandit game. Our 
proposed solution relies on a new algorithm based on successive rejects of the seemingly bad 
arms, and successive accepts of the good ones. This algorithmic contribution allows to tackle 
other multiple identifications settings that were previously out of reach. In particular we show 
that this idea of successive accepts and rejects applies to the multi-bandit best arm identification 
problem. 



1 Introduction 

We are interested in the following situation: An agent faces K unknown distributions, and he is 
allowed to do n sequential evaluations of the form (i, X) where i E {1, . . . , K} is chosen by the 
agent and X is a random variable drawn from the i th distribution and revealed to the agent. The 
goal of the agent after the n evaluations is to identify a subset of the distributions (or arms in the 
multi-armed bandit terminology) corresponding to some prespecified criterion. This setting was 
introduced in Bubeck et al. ]2009[ , where the goal was to identify the distribution with maximal 



mean. Note that in this formulation of the problem the evaluation budget n is fixed. Another 
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possible formulation is the one of the PAC model studied in |Even-Dar et al.| [|2002[1, |Mannor and 



Tsitsiklis [2004] where there is an accuracy of e and a probability of correctness 5 that are pre- 
specified, and one wants to minimize the number of evaluations to attain this prespecified accuracy 
and probability of correctness. This latter formulation has a long history which goes back to the 
seminal work Bechhofer [ 1954[ . In this paper we focus on the fixed budget setting of Bubeck 
et al.| ]2009[ . For this fixed budget problem, Audib erTet al.| [ |2010[ proposed a new analysis and 
an optimal algorithm (up to a logarithmic factor). In particular this work introduced a notion of 
best arm identification complexity, and it was shown that this quantity, denoted H, characterizes 
the hardness of identifying the best distribution in a specific set of K distributions. Intuitively, it 
was shown that the number of evaluations n has to be Vt{H/ log K) to be able to find the best arm, 
and the algorithm SR (Successive Rejects) finds it with 0(H log 2 K) evaluations. Furthermore in 
the latter paper the authors also suggested the open problem of generalizing the analysis and al- 
gorithms to the identification of the m distributions with the top m means. Our main contribution 
is to solve this open problem. We suggest a non-trivial extension of the complexity H, denoted 
H^ m \ to the problem of identifying the top m distributions, and we introduce a new algorithm, 
called SAR (Successive Accepts and Rejects), that requires only O (f/^)^ evaluations to find the 
top m arms. We also propose a numerical comparison between SAR, SR and uniform sampling for 
the problem of finding the m top arms. Interestingly the experiments show that SR performs badly 
for m > 1, which shows that the tradeoffs involved in this generalized problem are fundamentally 
different from the ones for the single best arm identification. 



As a by-product of our new analysis we are also able to solve an open problem of Gabillon 



et al. [201 1 1. In this paper the authors studied the setting where the agent faces M distinct best arm 
identification problems. A multi-bandit identification complexity was introduced, that we denote 
H ^fl. On the contrary to the setting of single best arm identification, here the algorithm proposed 



in 



Gabillon et al. [2011 1 that needs of order of ijW evaluations to find the best arm in each ban- 



dit requires to know the complexity to tune its parameters. Using our SAR machinerv, we 



construct a parameter-free algorithm that identify the best arm in each bandit with O 
uations. 



nacninery, we 
(#M||eval- 



Both the m-best arms identification and the multi-bandit best arm identification have numerous 
potential applications. We refer the interested reader to the previously cited papers for several 
examples. 



2 Problem setup 

We adopt the terminology of multi-armed bandits. The agent faces K arms and he has a budget of 
n evaluations (or pulls). To each arm i E {1, . . . , K} there is an associated probability distribution 
Vi, supported^jon [0, 1]. These distributions are unknown to the agent. The sequential evaluations 
protocol goes as follows: at each round t = 1, . . . , n, the agent chooses an arm I t , and observes 

'in the m-best arms identification problem we write u n — 0(v n ) when u n = 0(v n ) up to logarithmic factor in K 
2 In the multi-bandit best arm identification problem we write u n — 0(v n ) when u n = 0(v n ) up to logarithmic 
factor in MK 

3 One can directly generalize the discussion to er-subgaussian distributions. 



2 



a reward drawn from v It independently from the past given I t . In the m-best arms identification 
problem, at the end of the n evaluations, the agent selects m arms denoted Ji, . . . , J m . The ob- 
jective of the agent is that the set { Ji, . . . , J m } corresponds to the set of arms with the m highest 
mean rewards. 



Denote by . . . , fi K the mean of the arms. In the following we assume that px> ... > fx K . 
The ordering assumption comes without loss of generality, and the assumption that the means are 
all distinct is made for sake of notation (the complexity measures are slightly different if there is 
an ambiguity for the top m means). We evaluate the performance of the agent's strategy by the 
probability of misidentification, that is 

e n = P({J 1; ..., J w }^{l,...,m}). 

Finer measures of performance can be proposed, such as the simple regret r„ = Y^Liit^i ~ 



However, as it was argued in Audibert et al. [2010 1, for a first order analysis it is enough to focus 
on the quantity e n . 



In the (single) best arm identification, Audibert et al. [ 20 10 1 introduced the following complex- 
ity measures. Let Aj = y,\ — fa for % ^ 1, Ax = yt,i — fj, 2 , 



H 



K 1 



and 



Ho 



i=i 



max zA,- 2 . 
ie{i,...,K} 



It is easy to see that these two complexity measures are equivalent up to a logarithmic factor since 
we have (see |Audibert et al.|p0101 ) 



H 2 <H 1 < \og{2K)H 2 . 



(1) 



[Theorem 4, Audibert et al.| [ |20T0 |] shows that the complexity Hi represents the hardness of the 
best arm identification problem. However, as far as upper bounds are concerned, the quantity H 2 
proved to be a useful surrogate for For the m-best arms identification problem we define the 
following gaps and the associated complexity measures: 



A 



(m) 



HI 



(m) 



H 



(m> 



I'i 



if 
if 



K 

E 

i=i 



A 



max i 
ie{i,...,K} 



2 ■ 



aS } 



i < m 
i > m 



where the notation (i) £ {1, . . . , K} is defined such that A 
a similar lower bound to [Theorem 4 



Audibert et al. 



£y < . . . < A|^. We conjecture that 
j2010)] with H x replaced by H^ holds true 



for the m-best arms identification problem. In this paper we shall prove an upper bound on e n 
that gets small when n = 6 (h^) (recall that by {j}, O (h^) = O (h^A). This result is 
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derived in Section |3} where we introduce our key algorithmic contribution, the SAR (Successive 
Accepts and Rejects) algorithm. We also present experiments for this setting in Section [5] 



In Section [4] we consider the framework of multi-bandit introduced in Gabillon et al.| [ |20TT| , 
where the agent faces M distinct best arm identification problems. For sake of notation we assume 
that each problem m E {1, . . . , M} has the same number of arms K. We also restrict our atten- 
tion to the single best arm identification within each problem, but we could deal with m-best arms 
identification within each problem. We denote by vi(m), . . . , i^(m) the unknown distributions of 
the arms in problem m. We define similarly all the relevant quantities for each problem, that is 
/ii(m) > . . . > n K {m), Ax(m), . . . , A K (m), H 1 (m) and H 2 (m). Finally we denote by (i, m) the 
arm i in problem m. In the multi-bandit best arm identification, the forecaster performs n sequen- 
tial evaluations of the form (I t , m t ) G {1, . . . , K} x {1, . . . , M}. At the end of the n evaluations, 
the agent selects one arm for each problem, denoted (Ji, 1), . . . , (Jm, M). The objective of the 
agent is to find the arm with the highest mean reward in each problem, that is in this setting the 
probability of misidentification can be written as 

e n = P(3me {!,..., M} : J m ^l). 



Following Gabillon et al. [2011 1 we introduce the following complexity measure 



Al 



H[ M] = J2Hi(m). 



m=l 



Again we define a sort of weaker complexity measure by ordering the gaps. Let 



[M] 



< Ai M] < ■ ■ ■ < A 



[M] 



be a rearrangement of {Aj(m) : l<i<K,l<m< M} in ascending order, and let 

-2 



H. 



[M] 



max 

ke{l,...,MK} 



k A 



[M] 



We conjecture that a similar lower bound to [Theorem 4, Audibert et al. [ 20 10]] with Hi re 



placed by H[ holds true for the multi-bandit best arm identification problem. In this paper 
we shall prove an upper bound on e n that gets small when n = O yH^j (recall that by ([T|), 

& ^ 2 M] ) = O (h[ M] ^)). This result, derived in Section |4 
duced in Section |3j The improvement with respect to Gab! 



builds upon the SAR strategy intro- 



duced in Section |3j The improvement with respect to Gabillon et al. [ |2011[ is that our strategy 
is parameter- free, while the theoretical Gap-E introduced in Gabillon et al. [201 1| requires the 
knowledge of if{ M ' to tune its parameter. Moreover the analysis of SAR is much simpler than the 
one of Gap-E. 



For each arm i and all time rounds t > 1, we denote by Ti(t) = Y? s =i tne number of 
times arm i was pulled from rounds 1 to t, and by X^i, X^, • • • , t the sequence of associated 
rewards. Introduce s = - J2t=i Xi,t the empirical mean of arm i after s evaluations. Denote by 
X^ s (m) and juj s (m) the corresponding quantities in the multi -bandit problem. 
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3 ra-best arms identification 



In this section we describe and analyze a new algorithm, called SAR (Sucessive Accepts and 
Rejects), for the m-best arms identification problem, see Figure [T] for its precise description. The 
idea behind SAR is similar to the one for SR (Successive Rejects) that was designed for the (single) 
best arm identification problem, with the additional feature that SAR sometimes accepts an arm 
because it is confident enough that this arm is among the m top arms. Informally SAR proceeds as 
follows. First the algorithm divides the time (i.e., the n rounds) in K— 1 phases. At the end of each 
phase, the algorithm either accepts the arm with the highest empirical mean or dismisses the arm 
with the lowest empirical mean, and in both cases the corresponding arm is deactivated. During 
the next phase, it pulls equally often each active arm. The key to decide whether to accept or reject 
during a certain phase k is to rely on estimates for the gaps A f"^ . More precisely, assume that the 
algorithm has already accepted m—m{k) arms Ji, . . . , J m - m (k), i-e. there is m{k) arms left to find. 
Then, at the end of phase k, SAR computes for the m{k) empirical best arms (among the active 
arms) the distance (in terms of empirical mean) to the (rn(k) + l) th empirical best arm among the 
active arms. On the other hand for the active arms that are not among the m(k) empirical best 
arms, SAR computes the distance to the m{k) th empirical best arm. Finally SAR deactivates the 
arm i k that maximizes these empirical distances. If i k is currently the empirical best arm, then 



SAR accepts i k and sets m(k + 1) = m(k) — 1, J, 



m—m(k+l) 



i k , and otherwise it simply rejects 



ik- The length of the phases are chosen similarly to what was done for the SR algorithm. 
Theorem 1 The probability of error of SAR in the m-best arms identification problem satisfies 



e n < 2K 2 exp 



n-K 



8\og(K)Hi m) 



Proof Consider the event £ defined by 

£=\vie{l,...,K},ke{l,...,K-l}, 



s=l 



< -A 



(m) 



4 "(JT+l-k) 



By Hoeffding's Inequality and an union bound, the probability of the complementary event £ can 
be bounded as follows 



K K-l 



p(f)<EE p 



i=l k=l 
K K-l 



s=l 



> -A {m) 



<^2exp(-2n t (AW 14) /4) : 
n-K 



i=i fc=i 
< 2K 2 exp 



8log(K)Ht } 



where the last inequality comes from the fact that 



(K+l-k) 



> 



n-K 



> 



n-K 



log(K)(K + l-k) [A$ +1 



-AO 



log(K)H { 2 
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Let A 1 = {1, ... , K}, m(l) = to, log(X) = | + f , n = and for k e {1, . . . , X - 1}, 

[ 1 n-X 

n/c ~ iog(iir) AT + 1 - k 

For each phase fc = 1, 2, . . . , K — 1: 

(1) For each active arm i E A k , select arm i for n k — n fc _i rounds. 

(2) Let o k : {1, . . . , K + 1 — k} — >■ A k be the bijection that orders the empirical means by 

K k (i),n k > K k (2),n k > ■ ■ > K k (K+i~k),n k - For 1 < r < K + 1 - k, define empirical 
gaps 

a _ I fi<Tk(r),n k ~ K k (m(k)+l),n k if r < m(k) 

ZA o-fc(''),n fc — S ^ ^ , s 

lA*«T fc (m(fc)),n* - V* k (r),n k if T > to(A;) + 1 

(3) Let i fc G argmax ieAfe Aj „ fc (ties broken arbitrarily). Deactivate arm i k , that is set A k+ i = 
A k \ {i k }. 

(4) If fi iktnk > fia k (m(k)+i),n k then arm i k is accepted, that is set m(k + 1) = to(/c) - 1 and 

Jm— m(k+l) ik- 

Output: The to accepted arms Ji, . . . , J m . 

Figure 1: SAR (Successive Accepts and Rejects) algorithm for m-best arms identification. 



Thus, it suffices to show that on the event £, the algorithm does not make any error. We prove this 
by induction on k. Let k > 1. Assume the algorithm makes no error in all previous k — 1 stages. 
Note that event £ implies that at the end of stage k, all empirical means are within \ A^ +1 _ fe ^ of 
the respective true means. 

Let A fc = {ai, . . . , a^ + i_ fc } be the the set of active arms during phase k. We order the a/s 
such that /j, ai > fi a2 > ■ ■ ■ > fi aK+l k . To slightly lighten the notation we denote to' = m(k) for 
the number of arms that are left to find in phase k. The assumption that no error occurs in the first 
k — 1 stages implies that 

Oi, a 2 , ■ ■ ■ , a m / G {1, . . . , to}, a m /+i, . . . , a^+i-fc G {m + 1, . . . , X}. 
If an error is made at stage fc, it can be one of the following two types: 

1. The algorithm accepts aj at stage k for some j >m' + l. 

2. The algorithm rejects aj at stage k for some j < to'. 

Again to slightly shorten the notation we denote a = a k for the bijection (from {1, . . . , K +1 — k} 
to A k ) such that fi a (i),n k > K(2),n k > > K(K+i-k),n k - Suppose Type 1 error occurs. Then 
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dj = cr(l) since if the algorithm accepts, it must accept the empirical best arm. Furthermore we 
also have 

^aj,n k ~ /V(m'+l),n fc > ^a(m'),n k ~ H-cr(K+l-k),n k > (2) 

since otherwise the algorithm would rather reject arm a(K + 1 — k). The condition aj = cr(l) and 
the event £ implies that 



77 > 77 =^ // 4- -A vm/ > /; - -A vl> 



^(K+l-k) > o (K+l-k) — ~~ A^j V>ai ~ Hm+1 



We then look at the condition (Q. In the event of £, for all % < m', we have 

^ 1 a (m) 1 a (to) ^ 1 a (m) 



So there are m + 1 arms in A fc (namely a\, a 2 , . . . , a m /, a,) whose empirical means are at least 

\i m - \A { ^ +1 _ k) , which means %( m '+i), nk > fhn- \& { $ +1 _ k y On the other hand, fi„(K+i-k),n h < 

(m) 

(K+l-k)' 



fia K+1 _ k ,n k < Ha K+1 _ k + \^r^ +1 _ k y Therefore, using those two observations and ([2]) we deduce 



„ + lA <m> 1 - (u - -A {m) 



^(K+l-k) J \H"m ^(K+l-k) 



> 



u - -A <m> \-(u + -A <m> ^ 

H"m ^( K+ i_ k ) J I H'ax+i-k T ^(K+l-k) J 



^(K+l-k) — ~ r'O'j f 1 a K+1 _ k > ^a K+1 . 



Thus so far we proved that if there is a Type 1 error, then 

Am) 

\K+l-k) 



a (m) / -\ 

^( K+l-k) max VMai — A*m5 A*m — /VfiT+l-fc; 



But at stage fc, only fc — 1 arms have been accepted or rejected, thus Aj^ +1 _ fc ^ < max(/i ai — 
fJ"m, Hm — Mox+i-fc)- By contradiction, we conclude that Type 1 error does not occur. 

Suppose Type 2 error occurs. The reasoning is symmetric to Type 1. In fact, if we rephrase the 
problem as finding the K — m worst arms instead of the m best arms, this is exactly the same as 
Type 1 error. Hence Type 2 error cannot occur as well. This completes the induction and conse- 
quently the proof of the theorem. ■ 



4 Multi-bandit best arm identification 

In this section we use the idea of SAR for multi-bandit best arm identification. Here at the end 
of each phase we estimate the gaps A»(m) within each problem, and we reject the arm with the 
largest such estimated gap. Moreover if a problem is left with only one active arm, then this arm 
is accepted and the problem is deactivated. The corresponding strategy is described precisely in 
Figure [2] 
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Let A 1 = {(1, 1), ... , (K, M)}, log(MK) = § + £™J h n o = and for k e {1, MK - 
1}, 

[ 1 n-MK 
Hk ~ hg(MK) MK + 1 - k 

For each phase k = 1, 2, ... , M.K" — 1: 

(1) For each active pair (arm, problem) (i, m) E A k , select arm i in problem m for n k — n k _i 
rounds. 

(2) Let h k (m) be the arm with the highest empirical mean fi ijnk (m) among the active arms 
in the active problem m (that is such that (i, m) 6 A k ). 

(3) If there is a problem m such that h k {m) is the last active arm in problem m, then 
deactivate both the arm and the problem, and accept the arm. That is, set A k+1 = 

A k \ {(h k (m),m)} and J m = h k {m). Otherwise proceed to step (4). 

(4) Let (i k ,m k ) e argmax (i m)eAfc (jlh k (m),n k (m) -/2 ijnfc (m)) (ties broken arbitrarily). De- 
activate arm i k in problem m k , that is set A k+ i = A k \ {(i k , m k )}. 

Output: The M accepted arms ( J 1; 1), . . . , ( J M , M) (where the last accepted arm is defined by 
the unique element of A MK ). 



Figure 2: SAR (Successive Accepts and Rejects) algorithm for the multi-bandit best arm identification. 



Theorem 2 The probability of error of SAR in the multi-bandit best arm identification problem 
satisfies 

n-MK 



e n < 2M 2 K 2 exp 



8\og(MK)Hl M] 



Proof Consider the event £ defined by 



Vl<i< K, 1 < m < M, 1< k < MK 



s=l 



m) - Hi(m) 



1 



Following the same reasoning than in the proof of Theorem [T] it suffices to show that in the event 
of £ the algorithm makes no error. We do this by induction on the phase k of the algorithm. Let 
k > 1. Assume the algorithm makes no error in all previous k — 1 stages. Then at phase k, 
for all active problem m, the arm (1, m) is still active. Moreover, as only k — 1 arms have been 
deactivated, one clearly has 

max (/ii(m) - fii(m)) > A (MK+1 _ k) . 

(i,m)£A k 
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Suppose the above maximum is achieved for the arm (i*,m*), so we have 

/ii(m*) - Hi*{m*) > A {MK+1 _ k) . (3) 

Assume now that the algorithm makes an error at the end of phase k, i.e. some arm (l,m) is 
deactivated and it was not the last active arm in problem m. For this to happen, we necessarily 
have for some j G {2, . . . , K} (e.g., j = h k (m)), 

fij,n k (m) - fii >nk {m) > /2i jnfc (m*) - &*, njb (m*). (4) 

Clearly on the event £ one has 

= fij, nk {Tn) - fij(m) + Hj{m) - ^i(m) + //i(m) - jui, nfc (m) 
1. 

< 2^( M ^+ 1 - fc )- 
On the other hand, using ([3]) and £, one has 

fii,n h (m*) -iui*, nfc (w*) 

= jj, 1>nk (m*) -yL X {m*) + n x {m*) -/v(m*) + ^(m*) - %*, nk {m*) 
1 

Therefore, //i infc (m*) — juj* infc (m*) > p,j ink (m) — /i l nfc (m), contradicting (|4]). This completes the 
induction and the proof. ■ 



5 Experiments 



In this section we revisit the simple experiments of Audibert et al. [2010 1 in the setting of multiple 
identifications. Since our objective is simply to illustrate our theoretical analysis we focus on the 
m-best arms identification problem, but similar numerical simulations could be conducted in the 



multi -bandit setting and compared to the results of Gabillon et al. [201 1 1. 

We compare our proposed strategy SAR to three competitors: The uniform sampling strategy 
that divides evenly the allocation budget n between the K arms, and then return the m arms with 



the highest empirical mean (see Bubeck et al. [201 1 1 for a discussion of this strategy in the single 
best arm identification). The SR strategy is the plain Successive Rejects strategy of |Audibert 



et al. [2010| which was designed to find the (single) best arm. We slightly improve it for m-best 
identification by running only K — m — 1 phases (while still using the full budget n) and then 
returning the last m surviving arms. Finally we consider the extension of UCB-E to the m-best 



arms identification problem, which is based on a similar idea than the extension Gap-E of Gabillon 
et al.| ]2011[ for the multi-bandit best arm identification, see Figure [3] for the details. Note that 
this last algorithm requires to know the complexity H^' . One could propose an adaptive version, 
using ideas described in |Audibert et al. ]2010[ , but for sake of simplicity we restrict our attention 
to the non-adaptive algorithm. 
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Parameter: exploration parameter c > 0. 
For each round t = 1,2, ... ,n: 

(1) Let at be the permutation of {1, ... , K} that orders the empirical means, i.e., /v^i)^ m(t-i) > 

/V t (2),T CTt(2) (t-l) > > Mo-tCJO^tCX) (*-!)■ F ° r 1 - r - ^' define thC em P irical § a P S 



^o-t(r),T CTtCr) (t-l) ~ /V t ( m +l),T CTt(m+1) (t-l) 
.^W^wfi-l) ~ M<7 t (r),T CTt{r) (t-l) 



if r < m 
if r > m + 1 



(2) Draw 

I t S argmax — A, t + c\ 



n/H{ 



i£{l,...,K} V Ti(t - 1) 

Let Ji, . . . , J m be the m arms with highest empirical means ju^fnV 



Figure 3: Gap-E algorithm for the m-best arms identification problem. 

In our experiments we consider only Bernoulli distributions, and the optimal arm always has 
parameter 1/2. Each experiment corresponds to a different situation for the gaps, they are ei- 
ther clustered in few groups, or distributed according to an arithmetic or geometric progression. 
For each experiment we plot the probability of misidentification for each strategy, varying m be- 
tween 2 and K — 1. The allocation budget for each experiment is chosen to be roughly equal to 
maxKm^^.! Hi . We report our results in Figure 4 The parameters for the experiments are as 
follows: 

• Experiment 1: One group of bad arms, K = 20, /i2:2o = 0.4 (meaning for any j E 
{2,...,20},/i i = 0.4) 

• Experiment 2: Two groups of bad arms, K = 20, /i 2:6 = 0.42, [i 7:2 q = 0.38. 

• Experiment 3: Geometric progression, K = 4, /ij = 0.5 — (0.37)*, i G {2, 3, 4}. 

• Experiment 4: 6 arms divided in three groups, K = 6, fi 2 = 0.42, /i 3:4 = 0.4, /i 5 . 6 = 0.35. 

• Experiment 5: Arithmetic progression, K = 15, jii = 0.5 — 0.025z, i E {2, . . . , 15}. 

• Experiment 6: Three groups of bad arms, K = 30, /i 2: 6 = 0.45, yU7 :2 o = 0.43, /i2i:3o = 0.38. 

It is interesting to note that SR performs badly for m-best arms identification when m > 1, as 
it has even worse performances than the naive uniform sampling in many cases. This shows that 
the tradeoffs involved in finding the single best arm and finding the top m arms are fundamentally 
different. As expected SAR always outperforms uniform sampling, and Gap-E has slightly better 
performances than SAR (but Gap-E requires an extra information to tune its parameter, and the 
adapative version comes with no provable guarantee). 
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Experiment 6 (n = 50000) 









































10 15 20 25 30 

-UNI -»-SR SAR -•— Gap-E (c=2) 



Figure 4: Numerical simulations for the m-best arms identification problem. We chose c = 2 (exploration 
parameter) for the Gap-E algorithm in all experiments. 
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